Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/add syracuse qdr retriever #1

Merged
merged 24 commits into from
Jun 18, 2024
Merged

Conversation

george42-ctds
Copy link
Collaborator

@george42-ctds george42-ctds commented May 30, 2024

JIRA ticket: HP-1459

New Features

  • Add Syracuse QDR retriever function

Breaking Changes

Bug Fixes

Improvements

Dependency updates

Deployment changes

@george42-ctds george42-ctds marked this pull request as draft May 30, 2024 22:31
@george42-ctds george42-ctds marked this pull request as ready for review June 3, 2024 14:56
@george42-ctds george42-ctds marked this pull request as draft June 3, 2024 18:52
@george42-ctds
Copy link
Collaborator Author

We should merge PR #2, with Github actions, before merging this PR. Moving back to draft for now.

@george42-ctds george42-ctds marked this pull request as ready for review June 3, 2024 21:09
Dict of download status
"""

if not Path(download_path).exists():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: would it be better to default to . if download_path is not provided by the user?

# unpack the zip file
try:
logger.debug(f"Ready to unpack {filepath}.")
unpackage_object(filepath=filepath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are zipping all the files we are intended to download into a zip and then unzip it at the end? I'm curious why are we doing this, like what is the benefit of this solution vs downloading each individual file separatly and put them into a dedicated directory?

Copy link
Collaborator Author

@george42-ctds george42-ctds Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The QDR provides zip files for the GET:study_id and POST:bulk file_ids. We can also get single files via a GET request that is not a zip file. I can add a flag for downloading file_ids in bulk (zip) or one at a time (not zipped).

For the latter, should we rename the file after download? For example, if we download file_id=45139 should we rename the downloaded file from ‘45139’ to ‘Baum-et-al_INTERVIEW_GUIDE.pdf’? Or just log the filename at download?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not use the bulk endpoint for downloading using file IDs, I remember they said there are some size limitations on that endpoint. Even though we are very unlikely to hit by that limitation with the current HEAL studies they have on QDR, I think it is still better user experience to download them one by one by using their file IDs. So we don't really need to worry about the bulk file download endpoint for now
But we should definitely rename each individual files using their original filenames

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to download single files from the GET:datafile/{id} endpoint. The data get written into the filename that is listed in the Content-Disposition response header.

Copy link

github-actions bot commented Jun 6, 2024

The style in this PR agrees with black. ✔️

This formatting comment was generated automatically by a script in uc-cdis/wool.

Copy link
Contributor

@mfshao mfshao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! 👍 great work

@mfshao mfshao merged commit 6ea6d75 into master Jun 18, 2024
7 checks passed
@mfshao mfshao deleted the feat/add-syracuse-qdr-retriever branch June 18, 2024 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants